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Abstract 

We study the asymptotics of large simple graphs constrained by the limiting density 
of edges and the limiting subgraph density of an arbitrary fixed graph H. We prove 
that, for all but finitely many values of the edge density, if the density of H is con¬ 
strained to be slightly higher than that for the corresponding Erdos-Renyi graph, the 
typical large graph is bipodal with parameters varying analytically with the densities. 
Asymptotically, the parameters depend only on the degree sequence of H. 
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1 Introduction 

We study the asymptotics of large, simple, labeled graphs constrained to have subgraph 
densities e of edges, and r of some hxed subgraph H with i > 2 edges. To study the 
asymptotics we use the graphon formalism of Lovasz et al [8, 9, 2, 1, 10] and the large 
deviations theorem of Chatterjee and Varadhan [5], from which one can reduce the analysis 
to the study of the graphons which maximize the entropy subject to the density constraints 
[13, 14, 12, 6]. See dehnitions in Section 2. 

The phase space is the subset of [0,1]^ consisting of accumulation points of all pairs of 
densities f = (e,r) achievable by hnite graphs. (See Figure 1 for the case where if is a 
triangle.) Within the phase space is the ‘Erdds-Renyi curve’ (ER curve) {(e,r) | r = e^}, 
attained when edges are chosen independently. In this paper we study the typical behavior 
of large graphs for r just above the ER curve. We will show that the qualitative behavior 
of such graphs is the same for all choices of H and for all but hnitely many choices of e 
depending on H. 

To be precise, we show that for hxed if, for e outside a hnite set, and for r close enough to 
e^, there is a unique entropy-maximizing graphon (up to measure-preserving transformations 
of the unit interval); furthermore it is bipodal and depends analytically on (e,r), implying 
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Figure 1: Boundary of the phase space for the edge/triangle model in solid lines. On the 
right, the Erdos-Renyi curve is shown with dashes. 


that the entropy is an analytic function of (e,r). In particular we prove the existence of 
one or more well-dehned thermodynamic phases just above the ER curve. This is the hrst 
proof, as far as we know, of the existence of a phase in any constrained-density graphon 
model, where by phase we mean a (maximal) open set in the phase space where the entropy 
varies analytically with the constraint parameters. Conjectnrally, phases form an open dense 
snbset of the phase space. 
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Here c,pn-,Pi2 and P22 are constants taking valnes between 0 and 1. We prove that as 
r \ e^, the parameters c —)■ 0, P 22 —t e, and pu and pi 2 approach the solutions of a problem 
in single-variable calculus. The inputs to that calculus problem depend only on the degrees 
of the vertices of H. 

We say that a hnite graph H is k-starlike if all the vertices of H have degree k or 1, 
where /c > 1 is a hxed integer, /c-starlike graphs include /c-stars (where one vertex has degree 
k and k vertices have degree 1), and the complete graph on /c -|- 1 vertices. For hxed k, all 
/c-starlike graphs behave essentially the same for onr asymptotics. We prove onr results 
hrst for /c-stars, and then apply pertnrbation theory to show that the diherences between 
diherent /c-starlike graphs are irrelevant, and then prove the general case. 

To state onr results more precisely, we need some notation. Let 

So{w) = -]^[w\ogw + (1 - w)\og{l - w)], (2) 

and dehne the graphon entropy (or entropy for short) of a graphon g to be 

'S(fi') = / / So{g{x,y))dxdy. (3) 

Jo Jo 
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Let 


(4) 


i’kie.e) = 


2[So{e) - So{e) - S',{e){e - e)] 
— ke^~^{e — e) 


This function has a removable singularity at e = e, which we fill by defining 

^fc(e,e) = 




k{k — l)e^ ^ 


(5) 


For hxed e, let Cfc(^) be the value of e that maximizes (We will prove that this 

maximizer is unique and depends continuously on e.) 

Theorem 1.1. Let H be a k-starlike graph with i > 2 edges. Let e G (0,1) he any point 
other than {k — l)/k. Then there is a number tq > (depending on e) such that for all 
T G (e^,ro), the entropy-maximizing graphon at (e, r) is unigue (up to measure-preserving 
transformations of [0, l]j and bipodal. The parameters (c,pii,pi 2 ,P 22 ) are analytic functions 
of e and r on the region e {k — l)/k, t E (e^, ro(e)). Furthermore, as r \ we have that 
P 22 -t e, pi 2 Cfc(e), Pii satisfies S'q{pii) = 2S({pi2) - S'q{p 22 ), and c = 0(r - e^). 

Theorem 1.1 proves that there is part of a phase just above the ER curve for e < {k — l)/k 
and also for e > {k — l)/k] numerical evidence suggests these are in fact parts of a single 
phase; the only ‘singular’ behavior is the manner in which the graphon approaches the 
constant graphon associated with the ER curve. We will see in Theorem 1.2 that this 
behavior is only slightly more complicated for general LI than it is for fc-starlike H. 

When H has vertices with different degrees > 1, the problem resembles that of a formal 
positive linear combination of fc-stars. As in the fc-starlike case, we first solve the problem 
for the linear combination of fc-stars and then use perturbation theory to extend the results 
to arbitrary H . 


Theorem 1.2. Let H he an arbitrary graph with t edges with at least one vertex of degree 
2 or greater. Then there exists a finite set Bh C (0,1) such that if e ^ Bh, then there is 
a number tq > (depending on e) such that for all r G (e^,ro), the entropy-maximizing 
graphon at (e, r) is unigue (up to measure-preserving transformations of [0, 1]) and bipodal. 
The parameters (c, pii,pi2,P22) are analytic functions of e and r on the region e ^ Bh, 
T G (e^, ro(e)). Furthermore, as r \ we have that P 22 —t e, pi 2 approaches the maximizer 
of an explicit function whose data depends on e, pn satisfies S^^pn) = 2 S'q(pi 2 ) — Sq{p 22 ), 
and c = 0{t — e ^). 

The key differences between the Theorems 1.1 and 1.2 are: 


• For /c-starlike graphs, the set Bh of bad values of e consists of a single point, and this 
point is explicitly known: e = [k — l)/k. 

• For fc-starlike graphs, the behavior of (k is explicit. It is a continuous and strictly 
decreasing function of e, and gives an involution of (0,1). (That is, Cfc(Cfc(^)) = ^■) For 
fc = 2 it is given by C 2 (e) = 1 — e. In the general case, the limiting value of pi 2 , and 
its dependence on e, appear to be much more complicated. We do not know whether 
this limiting value is always continuous across the bad set Bh- 
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The organization of this paper is as follows. In Section 2 we review the formalism of 
graphons and establish basic notation. In Section 3 we establish a nnmber of technical 
results for /c-star models. Using these results, in Section 4 we prove Theorem 1.1 for the 
case that if is a /c-star. In Section 5 we show that just above the ER curve a model 
with an arbitrary /c-starlike H can be approximated by a /c-star model. By bounding the 
error terms, we prove Theorem 1.1 in full generality. In Section 6 we consider formal positive 
linear combinations of /c-stars, and prove a theorem much like Theorem 1.2 for those models. 
Finally, in Section 7 we show that the model for an arbitrary H can be approximated by a 
formal linear combination of /c-stars, thus completing the proof of Theorem 1.2. 


2 Notation and background 


We consider a simple graph G (undirected, with no multiple edges or loops) with a vertex 
set V (G) of labeled vertices. For a subgraph H of G, let TuiG) be the number of maps from 
V{H) into V{G) which preserve edges. The density th{G) of ii in G is then dehned to be 


th{G) ■— 


\Th{G)\ 

n\vm ’ 


( 6 ) 


where n = |U(G)|. An important special case is where ii is a ‘/c-star’, a graph with k edges, 
all with a common vertex, for which we use the notation Tk{G). In particular ti(G), which 
we also denote e(G), is the edge density of G. 

For a > 0 and r = (e, th) dehne to be the number of graphs G on n vertices with 
densities satisfying 


e(G) e (e - a, e + a), th{G) G (jh - a,TH + a). 


(7) 


Dehne the (constrained) entropy Sj 
function of m 


to be the exponential rate of growth of 
ln(ZU“) 


Sf = lim lim 

a\0 n^oo 




as a 
( 8 ) 


The double limit dehning the entropy Sf is known to exist [13]. To analyze it we make use of 
a variational characterization of Sf, and for this we need further notation to analyze limits 
of graphs as n ^ oo. (This work was recently developed in [8, 9, 2, 1, 10]; see also the 
recent book [11].) The (symmetric) adjacency matrices of graphs on n vertices are replaced, 
in this formalism, by symmetric, measurable functions g : [0,1]^ —?• [0,1]; the former are 
recovered by using a partition of [0,1] into n consecutive subintervals. The functions g are 
called graphons. 

For a graphon g dehne the degree function d{x) to be d{x) = g{x,y)dy. The /c-star 
density of g, Tk{g), then takes the simple form 


n{g) = [ d{xf dx. (9) 

Jo 

For any hxed graph H, the TT-density th of g can be similarly expressed as an integral of a 
product of factors g{xi,Xj). 

The following is Theorem 4.1 in [14]: 
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Theorem 2.1 (The Variational Principle). For any feasible set f of values of the densities 
f{g) := {e,TH) we have Sf = max[s( 5 f)], where the entropy is maximized over all graphons g 
with f{g) = f. 

(Instead of using s{g), some authors use the rate function I{g) := —s{g), and then minimize 
I.) The existence of a maximizing graphon g = g^ for any constraint f{g) = f was proven 
in [13], again adapting a proof in [5]. If the densities are that of edges and /c-star subgraphs 
we refer to this maximization problem as a star model, though we emphasize that the result 
applies much more generally [13, 14]. 

We consider two graphs equivalent if they are obtained from one another by relabeling 
the vertices. For graphons, the analogous operation is applying a measure-preserving map 
of [0,1] into itself, replacing g{x,y) with g{'ip{x),'ip{y)), see [11]. The equivalence classes 
of graphons under relabeling are called reduced graphons, and graphons are equivalent if 
and only if they have the same subgraph densities for all possible hnite subgraphs [11]. 
In the remaining sections of the paper, whenever we claim that a graphon has a property 
(e.g. monotonicity in x and y, or uniqueness as an entropy maximizer), the caveat “up to 
relabeling” is implied. 

The graphons which maximize the constrained entropy can tell us what ‘most’ or ‘typical’ 
large constrained graphs are like: if Pf is the only reduced graphon maximizing S{g) with 
f{g) = f, then as the number n of vertices diverges and —)■ 0, exponentially most graphs 
with densities fiiG) G (rj — + «n) will have reduced graphon close to Pf [13]. This 

is based on large deviations from [5]. We emphasize that this interpretation requires that 
the maximizer be unique; this has been difficult to prove in most cases of interest and is an 
important focus of this work. 

A graphon g is called M-podal if there is decomposition of [0,1] into M intervals (‘vertex 
clusters’) Cj, j = 1,2,. .., M , and M{M + l)/2 constants ptj such that g{x,y) = Pij if 
{x, y) E Ci X Cj (and pji = pij). We denote the length of Cj by Cj. 

3 Technical properties of star models 

For each star model, all entropy-maximizing graphons are multipodal with a hxed upper 
bound on the number of clusters, also called the podality [6]. For any hxed podality M, 
an M-podal graphon is described by iV = M(M -|- 3)/2 parameters, namely the values pij 
(1 < z < j < M) and the widths Q (1 < z < M) of the clusters. When it does not cause 
confusion, we will use g to denote the vector 

(ci, ■ ■ ■ tPim-iP 22 i ■'' ,P 2 M,-'' w'' ^ Pm-im- 1 , Pm-im , Pmm) , (10) 

which contains all these parameters. The problem of optimizing the graphon then reduces 
to a hnite-dimensional calculus problem. To be precise, let us recall that for an M-podal 
graphon, we have 

^( 9 ) = CjCjPij, Tk{g) = Cidi, s{g) = ^ QCjAo(pp), (11) 

l<i,j<M l<i<M 
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where di = J2i<j<M ^jPij value of the degree function on the ith cluster. The problem 

of searching for entropy-maximizing graphons with hxed edge density e and fc-star density 
Tk can now be formulated as 


max s ( 5f ), subject to: e{g) - e = 0, nig) - t = 0, C{g) = l. ( 12 ) 

se[o,i]^ 

where C{g) = J2i<j<M Cj- 

The following result says that the maximization problem (12) can be solved using the 
method of Lagrange multipliers. The existence of finite Lagrange multipliers was previously 
established in [ 6 ], treating the space of graphons as a linear space of functions [ 0 , 1 ]^ —)• [ 0 , 1 ], 
intuitively considering perturbations of graphons localized about points in [0,1]^. For star 
models we may restrict to M-podal graphons, as noted above, and thus consider perturba¬ 
tions in the relevant parameters pij and Cj. 

Lemma 3.1. Let g be a local maximizer in (12) . Then for constraints e, r off the ER curve, 
there exist unique a,/ 9,7 G M such that 


Vs(9) - aVe(9) - PVnig) - gVCig) = 0. 


(13) 


We do not include the proof, which follows easily from that of Lemma 3.5 in [ 6 ]. We 
also note that one can remove the variable cm and the constraint C{g) = 1, eliminating the 
multiplier 7 . 

For convenience later, we now write down the exact form of the Euler-Lagrange equa¬ 
tion (13). We first verify that 


de 


dp. 


V 


dTk k 


-A- —- 

“ ac, “ 

dTk 




dp,, 2 ^-^* ' ^ 


M 

2 ^ ^ ^jPij 2 (^ 2 , 

i=i 

M 

di Tk^Cjd’]-^Pij, 
J=i 


dC dC 

= 0 , 7^ = 1 , 


dp, 




ds 


dp. 




dci 

ds 

dci 


M 


^0 (Pij ) 2 Cj Sq {pij ), 

i=i 


(14) 

(15) 

(16) 
(17) 


where Aij = 2cjCj if i 7 ^ j and Aij = if i = j. We can then write down (13) explicitly as 
S'oiPi,) = a ++ d)-'). l<»<j<M (18) 

M 

2 ^ CjSoiPij) = 2adi + f3[di + k ^ Cjd’^~^pij) -h 7, I <i < M (19) 

i=i j=i 


These Euler-Lagrange equations, together with the constraints, 

e(^) - e = 0, Tk{g) - r = 0, C{g) -1 = 0, 


( 20 ) 
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are the optimality conditions for the maximization problem (12). In principle, we can solve 
this system to hnd the maximizer g. 

Next we consider the signihcance of the Lagrange multipliers a and f3. Suppose that go 
is the unique entropy maximizer for e = cq and r = tq. Then any sequence of graphons that 
maximize entropy for (e, r) approaching (eo,To) must approach gQ-. this follows from upper 
semicontinuity of the entropy and the fact that we can perturb go to any nearby (e, r) by 
changing some Pij. But if g = go + 6g, then 

s{g) = s{go) + dsg^{5g) + OiSg"^) 

= s{go) + adeg^ {5g) + jddTg,, {5g) + O^dg"^) 

= 5 ( 5 ( 0 )+ Q;(e-eo)+/3(r-To)+ 0(5/). ( 21 ) 

That is, ds{e,T)/de = a and ds{e,T)/dT = (3. 

If go is not a unique entropy maximizer, then we only have 1-sided (directional) deriva¬ 
tives: 


Lemma 3.2. The function s(e, r) admits directional derivatives in all directions at all points 
(e, t) in the interior of the profile. 


Proof. The change in entropy in a given direction is obtained by maximizing ds = ade + fddr 
over all entropy maximizers at ( 60 ,^ 0 ). That is, when hxied e and increasing r, we get the 
largest [3 of all the graphons that maximize entropy at (eo,ro), and when decreasing r we 
get the smallest (3. Likewise, when increasing or decreasing e we get the largest or smallest 
values of a, and when doing a directional derivative in the direction (^ 1 ,^ 2 ), we get the 
largest value of via + V2(3. □ 


Existence of directional derivatives implies the fundamental theorem of calculus, so for 
hxed e we can write 

s(e,r) = 5 ( 6 , 6 ^^) + / (3{gmax{e,T))dT, (22) 


where gmax{^, t) is the entropy-maximizing graphon at (e, r) that maximizes its right deriva¬ 
tive (with respect to r). 


Before proving Theorem 1.1 for /c-stars, we record some properties of the function e) 
of (4) and its critical points. 


Theorem 3.3. For fixed k and e, there is a unique solution to dfj'i^{e,e)/de = 0, which we 
denote e = Cfc(e). The function (k is a strictly decreasing, with nowhere-vanishing derivative 
and with fixed point at e = {k — l)/k. Furthermore, (k is an involution: e = Ck{,^) if 
only if e = Cfc(e)- 


Even though the proof is elementary we will need some parts of it later, so we give it 
here. 


Proof. Fix k > 2 and let 


— 2[S'o(e) - 5*0(e) - 5'o(e)(e - e)] 
= - ke^-\i - e) 


N{e,e) 

D{e,e) 
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( 23 ) 


be the numerator and denominator of the function e) = N/D. Note that these defi¬ 
nitions make sense for all real values of k, not just for integers. When taking derivatives 
of N, D and ip, we will denote a derivative with respect to the hrst variable by a dot, 
and a derivative with respect to the second variable by That is, D'{e,e) = dD/de and 
D{e, e) = dD/de. As noted earlier, this dehnition of ipk has a removable singularity at e = e, 
which we £11 in by defining 

^fc(e,e) = N"{e,e)/D"{e,e) = 2S'/{e)/[k{k - l)e'^-\ (24) 

The denominator D vanishes only at e = e. 

Some useful explicit derivatives are: 

N' = 21S;(£) - St(£)|, N" = 2S::(i) = 

]V = -2S;'(e)(e-6), V =-2S;(e), 

D' = kli^-^ - D" = k{k - 

D =-k{k-l)e'^-^{€-e), D' = -k{k-l)€’^-^. (25) 

Note that D and N both vanish when e = e, so we can write 

N{e,€) = N'{e,x)dx = N{x,e)dx, (26) 

and similarly for D{e, e). 

We proceed in steps: 

Step 1. Analyzing ip near e = e to see that ipp{e, e) = 0 only when e = {k — 1)/k. 

Step 2 . Showing that we can never have ip/ = ip'/ = 0. 

Step 3. Showing that the equation ip'^{e,e) is symmetric in e and e, implying that Cfc is an 
involution. 

Step 4. Showing that ipk has a unique critical point. 

Step 5. Showing that dCk/de is never zero. 

The following calculus fact will be used repeatedly. When D 7 ^ 0, = 0 is equivalent 

to N/D = N'/D', and ip'f. = ip/ = 0 is equivalent to N/D = N'/D' = N”/D". This follows 
from the quotient rule: 

DN' - ND' 

Id ’ 

DN" - ND" ^D'{DN' - ND') 

D~^ ^ Ip 
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iP' 

iP" 


(27) 



Step 1. Since N and D have double roots at e = e, we can do a Taylor series for both of 
them near e = e: 


N"{e, 6 ) (6 - e)V2 + e)(e - ef/d + ■ ■ ■ 

D"{e, e)(e - e)2/2 + D"'{e, e)(e - e)3/6 H- 

N"{e,e) + N"'{e,e){e-e)/3 + --- 
D"(e,e) + D'"(e,e)(e-e)/3 + ---' 


V’fc(e, e) = 0 is then equivalent to 


N"{e,e)D'"{e,e) 
k{k — l){k — 2)e^~^ 

(/c-2)(1-6) 

ke 


N"\e,e)D'\e,e) 

-A;(fc-l)e^-2(l-2e) 

e2(l-e)2 

l-2e 

k-1. 


(28) 


(29) 


Step 2. If = ij'l = 0, then we must have N'D" = D'N" and ND" = DN". We will 
explore these in turn. We write 

0 = N'D" - D'N" = D"{e, e)N'{x, e) - N"{e, i)D'{x, e)dx. (30) 

Explicitly, this becomes 

0 = I ;(1 - "‘"(1 - -)] ‘i- (31) 

The function x^~^(l — x) has a single maximum at a: = (k — l)/k. If both e and e are 
on the same side of this maximum, then the integrand will have the same sign for all x 
between e and e, and the integral will not be zero. Thus we must have e < (k — l)/k < e, or 
vice-versa, and we must have e^“^(l — e) < e^“^(l — e). Note that in this case the integrand 
changes sign exactly once. 

Now we apply the same sort of analysis to the other equation: 


0 = ND"-DN" = j D"{x,e)N{x,i) - N"{x,e)D{x,e)dx. (32) 

Explicitly, this becomes 

» = I ;(i !?)"((*_ E''(l - - -'■'■'(1 - -)] (« - ( 33 ) 

This is the same integral as before, only with an extra factor of {e — x). If we view the 
hrst integral (31) as a mass distribution (with total mass zero), then the second integral is 
(minus) the hrst moment of this mass distribution relative to the endpoint e. But we have 
already seen that the distribution changes sign exactly once, and so must have a non-zero 
hrst moment. This is a contradiction. 
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Step 3. If ND' = DN', then NjD = N'/D'. Call this common ration r. Then 


N = rD and N' = rD'. (34) 

Note that N' and D' are odd under interchange of e and e, so the second equation is invariant 
under this interchange. Furthermore, we have (e — e)N' — N = r[{e — e)D' — D]. However, 
{i — e)N' — N is the same as N with the roles of e and e reversed, while {e — e)D' — D is the 
same as D with the roles of e and e reversed. Thus the two equations are satished for (e, e) 
if and only if they are satished for (e, e). 

Step 4. For k = 2 we explicitly compute that '02 = 0 only at e = 1 — e. If is the 
inhmum of all values of k for which 'ipk has multiple critical points, then at a critical point 
of we must have '0(, = = 0, which is a contradiction. Thus does not exist, and 

'ipk has a unique critical point for all k > 2. In particular, (k is a well-dehned function. 


Step 5. The function is dehned by the condition that DN' — ND' = 0 (and e 7^ e, 
except when e = {k — l)/k). Let /(e, e) = DN' — ND' = D‘^'tp'. Moving along the curve 
e = Cfc(e) (that is, / = 0), we differentiate implicitly: 

0 = df = fde + f'de, (35) 


de _ -f 
de f' ■ 

We compute f' = DN" — ND". This is nonzero by Step 2. We also have 


(36) 


/ = DN' - ND' + DN' - ND' 

= -2S''{e){D - (e - e)D)' + k{k - l)e’‘-\N - (e - e)N') 

= 2S'Q(e)[e^ — + fc(e — e)e^ ^] — 2k{k — l)e^ ^[S'o(e) — S'o(e) + (e — e)S'Q(e)] 

= D{e,e)N"{i,e) - N{e,e)D"{e,e). (37) 

The arguments in the last line are written in the correct order! That is, / is the same as /', 
only with the roles of e and e reversed. Since the equation / = 0 is symmetric in e and e, 
the argument of Step 2 can be repeated to show that / 7 ^ 0. 

Since de/de is never zero, and since de/de = —1 at the hxed point (by symmetry), 
C(,(e) = dejde must always be negative. □ 


4 Theorem 1.1 for /c-stars 

Theorem 4.1. Let H be a k-star and suppose that e ^ {k — l)/k. Then there exists a 
number tq > such that for all r G (e^,ro), the entropy-optimizing graphon at (e, r) is 
unigue and bipodal. The parameters (c,pii,pi 2 ,P 22 ) cire analytic functions of e and r. As r 
approaches from above, P 22 -t e, pi 2 -t Pu satisfies S'q(pii) = 2 S'q(pi 2 ) - S'q{p 22 ) 

and c = 0{t — e^). 
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Proof. The entropy-maximizing graphon for each (e, r) is multipodal [ 6 ], and the parameters 
{cj} and {P 12 } must satisfy the optimality conditions (18), (19). The hrst step of the proof 
is to estimate the terms in the optimality equations to within o(l). This will determine the 
solutions to within o(l) and demonstrate that our optimizing graphon is close to bipodal 
of the desired form. The second step, based on a separate argument, will show that the 
optimizer is exactly bipodal. The third step shows that the optimizer is in fact unique. 

In doing our asymptotic analysis, our small parameter is At := r — e^. But we could just 
as well use As := s{g) — So{e) or the squared norm of Ag := g—go, where go{x, y) = e (here 
g denotes the graphon as a function [0, 1 ]^ —)• [0, 1 ], not a vector of multipodal parameters.) 
We claim that these are all of the same order. Through arguments found in [14], one can 
bound At above by a multiple of || and bound |As| below by a multiple of || A^fp. By 

considering a bipodal graphon with pn = pi 2 = Cfc(^) and P 22 close to e, we can bound | As| 
above by a constant times At. This shows that 0(As) = 0(Ar), and 0 (||A 5 f|p) is trapped 
in between. 

Order the clusters so that the largest cluster is the last cluster (of length cm)- By 
subtracting the equation (19) for cm from the equations for Cj, we eliminate 7 from our 
equations: 

7(P«) = a + \fi(dt-' + dP) 

M / M \ 

‘2'Y^Cj {So{pij)-So{pMj)) = 2a{di-dM) + P [ d^-d’lj + k^Cjd’]~^{pij-pMj) J -(38) 

j=i \ j=i J 

Step 1. Since || Agf]! is small, the area of the region where g{x, y) differs substantially from 
e must be small. Thus all clusters must either have di close to e or c* close to zero (or both). 
We call a cluster Type I if q is close to 0 and Type II if di is close to e. (If a cluster meets 
both conditions, we arbitrarily throw it into one camp of the other). The first equation in 
(38) implies that, for fixed i, the values of pij are nearly constant for all j of Type II. Since 
the Cj’s are small for j of Type I, this common value must be close to di. To within o(l), 
our equations then simplify to 

5'o(dj) = a+-(3{d^ ^ ^), 

So{di)-So{e) = a{di - e) +/3[d’i - e’^ + ke’^-^di - e)]. (39) 

Since dju = e + o(l), the first of those equations applied to du implies that 

a + ke^-^13 = S'^{e) + o{l). (40) 

We can thus replace a with S'o(e) — ke^~^l3 + o(l) throughout. This gives the equations 
(again with o(l) errors): 

2{S',{d^) - S'oie)) = k/3{dt^-e^-^), 
2[So{d,)-So{e)-S',{e){di-e)] = /3[d’l - - ke’^-\di - e)]. (41) 

There are two solutions to these equations. One is simply to have di = e, in which case 
both equations say 0 = 0. Indeed, we already know that there must be clusters with di close 
to e. In looking for solutions with di 7 ^ e, the second equation says that jd = t/’fc(e, di). 
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We can also divide the first equation by the second to eliminate /3. This gives an equation 
that is algebraically equivalent to dxljk{e,di)/ddi = 0. In other words, di must be the unique 
critical point Ck{^) of 'ipk, and fd must be the critical value. In fact, the critical point is a 
maximum of ^|Jk■ Remember that s(e, r) = s(e, e^) + f I /d from (22). Since the computation 
of [d is independent of At (to lowest order), we have s(e, r) — s(e,e*') = (ddr + o(Ar), so 
maximizing fd is tantamount to maximizing s. 

Step 2. We have shown so far that the optimizing graphon is multipodal, with all of the 
clusters either having di close to Cfc(^) close to e. We rehne our dehnitions of Type I and 
Type II so that all the clusters with di close to Cfc(c) are Type I and all the clusters with 
di close to e are Type II. Since the value of g{x,y) is determined by d{x) and d{y) (and 
a and /d), this means that the optimizing graphon is nearly constant (i.e. with pointwise 
small fluctuations) on each quadrant. We order the clusters so that the Type I clusters come 
before Type II. 

Let gb be the bipodal graphon obtained by averaging over each quadrant. Let Agj = 
g — gt- (The f stands for “further”.) We will show that having Agj non-zero is an inefficient 
way to increase r, that is, (s(fi') — s{gb))/{r^g) — r^gb)) is less than fd. This will imply that 
Agf = 0 and so g = gb- 

Since x = d{x)^dx, the changes in r are a function only of the marginal distributions 
of Agf. Once these are hxed, the values of Agf on each quadrant must take the form 

Agf{x,y) = (function of x) + (function of y). (42) 

The reason is that we can write the entropy on each quadrant as JJ So{gb + Agf) = 
ff So{gb) + SQ{gb)Agf + {l/2)SQ{gb)Agj -!-■■■. The hrst term is independent of Agf and 
the second is zero (since gb was assumed to equal the average value of -|- Agf on the 

quadrant). Since the changes to the graphon are pointwise small, we can ignore terms past 
the second, so we are basically left with SQ{gb)/2 times the squared norm of Agf on the 
quadrant, which we then minimize subject to the constraint that the marginal distributions 
are hxed. We can write Agf{x,y) = (fi^x) + ^ 2 ( 2 /) + (l>3{.x,y), where cfi and cj)2 give the two 
hxed marginals, and 03 has zero marginals. But then j Agj = / 0i + 02 + *5^1) since all of 
the cross terms integrate to zero. (Integrating (f)2{y)(f>3{x,y) over x or 0i(a;)03(a;, y) over y 
gives zero since 03 has zero marginals, and integrating (f>i{x)(f>2{y) over either x or y gives 
zero since 0i and 02 have mean zero). The way to minimize J Agj is simply to take 03 = 0 . 
This establishes (42). 

Furthermore, to maximize T{g) — T{gb), the functions of x should be the same (up to 
scale) in the I-1 and I-I I quadrants, and the same (up to scale) in the 11-1 and 11-11 
quadrants. This is because T{g) — T{gb) ~ k{k — l)e^~‘^5d{x)‘^dx involves a cross term 
between the contributions to 5d{x) from two quadrants, and this cross term is maximized 
when the corresponding functions point in the same direction. 

The upshot is that there are functions Fi{x) on [0, c] and F 2 {x) on [c, 1], each with mean 
zero and normalized to have root-mean-squared 1, and constants /i, k, A, such that 
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Agf{x,y) = jj,Fi{x) + yFi{y) on the FI square. 

Agf{x,y) = uFii^x) + K,F 2 {y) on the /-// rectangle. 

Agf{x,y) = \F 2 {x) + \F 2 {y) on the //-// square. (43) 

Now we compute the changes in r and in s, to second order in (/i, z/, k, A), noting that all 
of the hrst-order changes are zero, and that the integral of AgJ over the I-I square, the two 
rectangles, and the II-II square are 2 c(l — c)(z/^ + k^), and 2(1 — c)^A^, respectively. 


s{g) - s{gb) = /i^c^S'o(pii) + z/^c(l - c)Sq{pi 2 ) 

+ K^C{1 - c)S''ipu) + A 2 (l - cfS''ip22): 

- 7‘(5'b) = c/c(/c - l)4“^(/ic +z/(l - c))V2 + 

+ (1 — c)A;(A; — l)(i 2 ~^(^c + A(1 — c))^/2. (44) 

Both the change in s and the change in r are the sum of two terms, one involving p and u, 
and the other involving k and A. Let: 


= /i2c^S'o(pii) + z/^c(l - c)S'o(pi 2 ), 

AI 2 = K\il-c)s''ip^2) + \\l-cys''ip22), 

Bi = ck{k — — c))'^/2, 

B 2 = {l-c)k{k-l)4-\Kc +\ll-c)f/2, (45) 

so to lowest order, 

Sjg) - Sjgb) ^ A, + A 2 ^ 

Tk{g)-Tk{gb) B 1 + B 2 

For the perturbations involving only k and A, the ratio A 2 /B 2 depends only on r = k/X: 


^ ^ ‘^[r^cS'^iPu) + (1 - c)S'^{p22)] 

B 2 k{k — l)d2~‘^{rc+{1 — c)Y/2 

We optimize by taking a derivative w.r.t. r and setting it equal to zero, with the result 
that r = 5 'q(p 22 )/*S'o(pi 2 ), independent of c. Since r does not diverge as c —)■ 0, the limit 
of A 2 /B 2 as c —)■ 0 can be obtained by simply setting c = 0, giving a limiting ratio of 
2SQ{e)/[k{k — l)d 2 ~‘^] = ipk{^,e) < (5. Since the limit is less than /3, the ratio must be 
smaller than fl for all sufficiently small values of c. 

Almost identical arguments apply to the perturbations involving only p and v. The 
optimal ratio p/v is then S'Q(pi 2 )/S'g(pii), which again cannot diverge as c —)■ 0. Thus for 
small values of c the dominant terms are those involving z/, and the ratio A\jBx approaches 
2Sq{pi 2)/[k{k — l)dll~'^]. But di ~ pi 2 ~ e, so our ratio goes to 2S'o(e)/[A;(A; — l)e^“^] = 
'ipk{e,e) < (3. 

Thus there is a constant (Iq < (3 such that Ai < /3oBi and A 2 < (3oB2, so Ai + A 2 < 
+ B 2 ), so 

s{g) - s{gb) < /3o{r{g) - r{gb)). (48) 
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However ds/dr ~ (3 for changes in c that preserve the bipodal structure. This means 
if we perturb a bipodal graphon to maximize s, it is better to perturb c than to make 
{fi, u, K, A) nonzero. Thus k and A must both be zero, implying that there is only one Type 
II cluster, and fi and u must be zero, implying that there is only one Type I cluster. 


Step 3. We have established that the minimizing graphon is bipodal, with P 22 ~ e and 
P 12 ~ Cfc(^) • AVe now show that the form of this graphon is unique. Since the equation is 
bipodal, we consider the exact optimality equations. After eliminating 7 , we have 


S'oiPii) = a + kf3d’l-\ 

^o{Pi2) = O' + -^/3{d'l ^ + d^ ^), 

So{P22) = a + kf3d2~^, 
dS _ de dr 

dc _ ^ dc~^ dc 
^ — ^ 0 ) 

T = To. ( 49 ) 


We use the second and third equations to solve for a and j3\ 

-S'o{p22){d^2~" + 4"') + 24"'^o(P12) 

“ 4-1 _ 4-1 

. ^ 2 S'o{p22) - S'o{pi2) 

^ k 4 - 1 - 4-1 

Plugging this into the hrst equation then gives 


>50(^11) — 2 S'q(pi 2 ) + Sq{p 22 ) — 0 . 
This leaves four equations in four unknowns, which we write as 


/ = 


/ 0 \ 

0 

eo 

\toJ 


where 


(50) 


(51) 


(52) 


/i = - 27 (pi2) + SWpzs). 

„ ds de ^ dr 

12 = rr - - Pjr^ 

dc dc dc 

fs = C^Pll + 2 c(l - c)pi 2 + (1 - cfp 22 , 

U = c4 + (l-c)4, (53) 

and where a and (3 are given by (50). 

We know a solution when tq = e^, namely P 22 = ^o, P 12 = Ck{^o)j c = 0 and pn = 
S'Q“^(2S'g[(Cfc(eo)] — S'Q(eo)). We will show that df has non-zero determinant at this point. By 
the inverse function theorem, this implies that, when tq is close to Cq, there is only one value 
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of {Pn,Pi2,P22, c) close to this point for which f{pn,Pi2,P22,c) = (0, 0, Cq, Tq)^. Moreover, 
the parameters {pn,Pi2,P22,c) depend analytically on eo and tq. This will complete the 
proof. (Note that we have reordered the variables by listing c last.) 

The derivatives of /i, /a, and /4 are: 

dfi = {S''{pn),- 2 S''{p^ 2 ),S''{p 22 ), 0 ), 

dfs = (c^, 2 c(l - c), (1 - cf, 2 cpn + 2(1 - 2 c)pi 2 - 2(1 - 0 )^ 22 ), 
d /4 = {kc^d’l~^,kc{l — + d2~^),k{l — c)'^d2~^, 

4 - 4 + kcd^~^{pn - P 12 ) + k{l - c)d 2 ~^{pi 2 - ^ 22 ))- (54) 


Evaluating at c = 0 gives 

dfl = ( 5 "(Pii),- 2 ^"(P 12 ),^ 4 P 22 ), 0 ), 

d/s = (0,0, l, 2 pi 2 - 2 ^ 22 ), 

dfi = (0, 0, /CP22■^P12 - P22 + kp2f^{pi2 - P22))- ( 55 ) 

d/ is block triangular, with 2 x 2 blocks. The lower right block has determinant p\2 — P22 ~ 
kP22^{pi2 - P22) = D{p22,Pi2), which is non-zero when pi2 7^ P22, he. when eo ^ {k - l)/k. 

Also df2/dpii = 0 when c = 0, since a and (3 are independent of pu (when c = 0) and 
since S/dcdpn, d'^e/dcdpn and d‘^t/dcdpii are all 0 (c). As a result, 

det(d/) = So{pn)df2/dpuD{p22,Pi2)- ( 56 ) 


So as long as pi 2 7 ^ P 22 (he. as long as Cq 7 ^ (k — l)/k), everything boils down to 
computing df 2 /dpi 2 at c = 0 and seeing that it is nonzero. We compute 


d/d ^ 2 (p^2 ^-Pi2 ^)i-So(pi2)) - (So(p22) - S'f^(pi2))(-(k - l)pt2^) 
<9pi2 k (p^-^ - p\-y 

_ 2 (k — l)Pi 2 ‘^(So(p22) — >S'o(pi2)) — (P 22 ^ ~ Pl2 ^)So(pi2) 

k (P 22 ^ ~ P 12 


at c = 0 . We will show separately that this quantity is nonzero. 
Since a = S'o(p 22 ) — kfdd/f^^., 


da 

dpi2 


—kd2~^-^ - k{k — I)d2~‘^cf3 —kp^f^-^^ 

dpi 2 ^ 2 /^22 


where ^ denotes a limit as c —)■ 0. We also compute 


d^S 

dcdpi2 

d^e 

dcdpi2 

dH 

dcdpi2 


2(1 - 2 c)^'(P 12 ) ^ 2 A'(pi 2 ) 

2(1 - 2c) ^ 2 

k(l - 2c)(d\-^ + d^i) ^ 442“' +P22-') 


(57) 


(58) 


(59) 
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Finally we combine everything: 


df2 


dp 


12 


c=0 


d^S _ 
dcdpi2 

2 ^;(P12) 


da de 


dh 


dpi2 dc 
- 2 a 


— a 


dp dr 


dcdpi2 dpi2 dc 

k—1 I k—l\ 


-P 


d‘^T 


dcdp 


12 


PKp12 ' + PI 2 


{kp22 ^(2P12 - 2^22) - {Pu - P22 + kp22 \Pl2 - P22))) 


dp 

dp 


( 60 ) 
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The terms not involving dp/dpi 2 all cancel, by the second variational equation, and we are 
left with 

= -Dipn.Pn)^^. ( 61 ) 

Finally, we need to show that dp/dpi 2 7 ^ 0. Since pi 2 maximizes 'ipk{p 22 ,Pi 2 ) (for hxed 
P 22 ), we must have (referring to the notation of the proof of Theorem 3.3) {N/D)' = 0, 
or equivalently N'/D' = N/D, where we write i/k = N/D, as above. But P = N'/D'. If 
dp/dpi 2 were equal to zero, then we would have N”/D” = N'/D'. But we have previously 
shown that it is impossible to simultaneously have N/D = N'/D' = N"/D", except at 
P 12 = P 22 = {k — l)/k, so dp/dpi 2 must be nonzero whenever cq ^ {k — l)/k. This makes 
det{df) nonzero at {pii,Ck{^o)y 0 )) so the solutions near this point are unique and analytic 
in (e, r). □ 


5 Theorem 1.1 for /c-starlike graphs. 

Now suppose that if is a fc-starlike graph with i edges, and with nj. vertices of degree k, and 
let r be the density of H and Tk be the density of f-stars. Our hrst result relates At := r — e^ 
to Axfc := Tk - e^. 

Lemma 5.1. If g is an entropy-maximizing graphon for (e, r) with t > , then At = 

UkC^-’^ATk + 0(Arf'^^). 

Proof. Writing g{x, y) = e + Ag{x, y), we expand r as a polynomial in Ag: 

r = y d:siY\_ 9 {xi,Xj) = j dx JJ(e + A 5 ((a;i, (r^)), (62) 

where there is a variable Xi for each vertex of H and the product is over all edges in H. 

The 0 -th order term is e^. The hrst-order term is identically zero, since Jf Ag(x, y)dx dy = 
Ae = 0. When looking at higher-order expansions, there are some terms that come from 
having all Ag's along edges that share a single vertex of degree k. These terms also appear 
in the expansion of Tk, so the sum of those terms is exactly e^~^ATk. Since all vertices have 
degree k or 1 , summing these terms gives n^e^^^Ar^. 

What remains are terms where the Ag's refer to edges that do not all share a vertex. We 
bound these in turn. In each case, let {cj} be the set of edges that correspond to factors of 
Ag. 

• If one of the e^s is disconnected from the rest, then the integral is exactly zero. So we 
can assume that all connected components of {cj} contain at least two edges. 
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• If there is more than one connected component, then we get a product of factors, one 
for each connected component. Each factor is bounded by a constant times ||A 5 f|p, so 
the product is 0(|| 

• If there is only one connected component, whose edges do not all share a vertex, 
then {cj} either contains a triangle or a chain of three consecutive edges. We bound 
such terms by taking absolute values of all the A^f’s and setting all terms other 
than the three edges in the triangle or 3-chain to 1. The result is either a con¬ 
stant times JfJJ\Ag{w,x)\\Ag{x,y)\\Ag{y,z)\dwdxdydz, or by a constant times 
JJf \Ag{x,y)\\Ag{y, z)\\Ag{z,x)\dx dydz, either of which in turn is bounded by a 
constant times HA^fjl^. (If we then think of \Ag\ as the integral kernel of an operator 
L on L^(0,1), then the integral for a 3-chain is the expectation of in a partic¬ 
ular state, and the integral for a triangle is the trace of L^. Both are bounded by 
rr(L")3/2=|||Ag|f =||A9|| = , ) 

Since Ar^ scales as HA^fp, all the corrections to the approximation At ^ nke^~^ATk are 
0 (Ar^'^^) or smaller. □ 

5.1 Proof of Theorem 1.1 

Since At is proportional to Ar^ (plus small errors), the problem of optimizing As/ At is a 
small perturbation of the problem of optimizing As/Ar^,, or equivalently optimizing As for 
hxed Arfc, which we solved in the last section. Since that problem has a unique optimizer, 
any optimizer for As/ At must come close to optimizing As/Ar^, and so must be close to 
the bipodal graphon derived in Theorem 4.1. 

We can thus write g = gb + Agj, as in the last steps of the proof of Theorem 4.1, where 
gb = e + Agb is a bipodal graphon with P 22 ~ e and pi 2 ~ Cfc(e) and where Agj is a function 
that averages to zero on each quadrant of gb- 

Lemma 5.2. The function Agf is pointwise small. That is, as t ^ , Agf goes to zero in 

sup-norm. 

Proof of lemma. Since we no longer in the setting where the entropy maximizer is proven 
to be multipodal, we cannot use the equations (38) directly. However, we can still apply 
the method of Lagrange multipliers to pointwise variations of the graphon. (See [6] for a 
rigorous justihcation.) These variational equations are 


We need to compute dr/Sg and show that it is nearly constant on each quadrant. Since a 
and ft are constants, this would imply that g{x,y) is nearly constant on each quadrant, and 
hence that Agf is pointwise small. Let go{x,y) = e. 

Since ||A 5 f|| is small (where Ag = g — go = Agb + Agf), we can End a small constant 
a = 0 ( 1 ) such that, for all x outside a set C [0,1] of measure a, \Ag{x, y)\dy < a. (This 
set U is essentially what we previously called the Type I clusters, but at this stage of the 


5s 6t 

= a ft 


5g{x,y) 


5g{.x,yy 


(63) 
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argument we are not assuming a multipodal structure. Rather, we are just using the fact 
that T — = 0(||A5(|P).) 

The functional derivative Sr/Sg has a diagrammatic expansion similar to the expansion 
of T. For each edge of H, we get a contribution by deleting the edge, assigning the values 
X and y to the endpoints of the edge, and integrating over the values of all other vertices. 
Since U is small, we can estimate Sr/Sg to within o(l) by restricting the integral to 
where v is the number of vertices in H and is the complement of U. This implies that 
terms involving Ag can only contribute non-negligibly on edges connected to x or to ?/. 
Furthermore, they can only contribute when attached to x if x G 17, and can only contribute 
when attached to y if y E U. 

We now begin a bootstrap. We will show that Sr/dg is nearly constant on each quadrant 

X U X 17 X t/ in turn. This will show that g is nearly constant on that quadrant, 
which will help us prove that Sr/Sg is nearly constant on the next quadrant. 

If X and y are both in f/'^, then the contributions of the terms involving Ag are negligible, 
so 5t/ 5g{x, y) can be computed, to within a small error, using the approximation g[x, y) ~ e. 
But when g{x,y) = e, 5t/ 5g{x,y) is independent of x and y. Since 5t/ 5g{x^y) is nearly 
constant on 17x U^, equation (63) implies that g is nearly constant on x In particular, 
Agf is pointwise small on x 17'^. 

Next suppose that y E and x E U. Then 5t/ 5g{x,y) is nearly independent of y, 
so g{x,y) is nearly independent of y, and is nearly equal to d{x). But then the integrals 
involved in computing 5t/ 5g{x,y) are easy, where we use g^ + Ag on the edges connected 
to X, gQ on all other edges, and only integrate over If the degree of x is k, then the 

edges connected to x contribute d{xY~^e^~^. Summing over edges, and symmetrizing over 
the assignment of x and y to the two endpoints, we obtain the approximation 

(64) 

Up to an overall factor of this is the same functional derivative as for a fc-star. This 

also applies if x G except that in the latter case d{x) ~ e, and also applies if x G and 
yEU. 

In other words, we can use the approximation (64) in (63) whenever either x or y (or 
both) is in U^. This implies that the integrated equations (39) apply for all x (with di 
replaced by d{x), and with (3 scaled up by Following the exact same reasoning as 

in the proof of Theorem 4.1, we obtain that d{x) only takes on 2 possible values (up to 
o(l) errors). We then dehne Type I and Type II points, depending on whether the degree 
function is close to Cfc(^) or respectively, so that U is precisely the set of Type I points. 
Our graphon is then nearly constant on the /-//, II — I and 11-11 quadrants. 

We still need to show that the graphon is nearly constant in the I-1 quadrant. Suppose 
that X and y are in U. In computing 6 T/6g{x, y), we approximate our integral by integrating 
over But if z E U^, then g{x,z) is (nearly) independent of x, since we have just 

established that g is nearly constant on the I-II quadrant. Thus Sr/dg (which is obtained 
by integrating products of terms g{x,z)) is nearly independent of x. Likewise, it is nearly 
independent of y, implying that g{x,y) is nearly constant on the /-/ quadrant. 

Note, by the way, that the approximation (64) does not apply in the /-/ quadrant; in that 
case dr/Sg contains terms with powers of both d{x) and d{y). However, that approximation 
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is not needed for our proof, since the I-I quadrant only contributes 0(c) to the integrated 
equations (39). □ 


Returning to the proof of Theorem 1.1, we need to compare s{gb + ^Qf) — s{gb) to 

T{gb + ^Qf) -r{gb). 

As before, we expand T{g) as the integral of a polynomial in g^ obtained by assigning 
g^ + /S.gb + to each edge of H and integrating. The difference between T{gb + A^f/) and 
T{gb) consists of terms with at least one A^f/. However, the terms with exactly one Agf 
are identically zero, since gb is constant on quadrants, and Agf averages to zero on each 
quadrant. Furthermore, terms for which all of the AgbS and Agf^s share a vertex are exactly 
what we would get from the approximation At Any term that distinguishes 

between At and must have at least two Agf^s and either a third Agf or a Agb, 

forming either a 3-chain, a triangle, or two connected Agfa’s and a disconnected Agb- 
Let Ag'f{x,y) = \Agf{x,y)\, and let 


^9b{x,y) 


2c x,y E II, 
1 otherwise. 


(65) 


This is conveniently expressed in terms of outer products. Let |1) G L^([0,1]) be the constant 
function 1, and let |a;) be the function 


u{x) 


0 X < c, 
1 X > c. 


( 66 ) 


Then 


^9b = | 1 )( 1 | - l^)(^l + 2 c|a;)(a;| 

= |1)(1 — a;|-f-|1 — a;)(a;|-|-2c|a;)(a;|. (67) 

Note that \Agb{x, y)\ < Ag'^^^x, y) for all x,y E (0,1). To see this, the only issue is what 
happens when {x, y) is in the II — II quadrant, since otherwise we trivially have \ Agb\ < 1 . 
Since e{g) is hxed, (1 — c)^ times Agb{x, y) for x,y > c equals minus the integral of Agb 
over the other three quadrants. But the area of those three quadrants is 2c — < 2c, and 

the biggest possible value of | A^f^l is max(e, 1 — e) < 1 , so / | A^f^l (integrated over the 

/ — and II — I quadrants) is strictly less than 2c -|- O(c^), and so is bounded by 

2 c for small c (note that O(c^) errors are negligible). 

We obtain upper bounds on the contributions of the relevant terms in the expansion of 
r by replacing three Agf{x,yys and Agb{x,yys with Ag'f[x,y) and Ag[[x,y), respectively, 
and replacing all other terms with 1 . 

Since all graphons are symmetric, hence Hermitian, their operator norms are bounded 
by their norms, so for any 3-chain 

{l\Ag[Ag'^Ag'y\l) < ||Ac/;||||A^^||||A^^||. (68) 

Since ||A 5 f[,|| and ||A 5 f)-|| are both o(l) (more precisely, 0(\/r — e^)), the contribution of any 
3-chain is bounded by an o(l) constant times HA^fjjp. 
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As for triangles, Tr{Ag'^) < = ||A 5 fj-||^. Finally, we mnst estimate the trace of 

Ag^jrAg'jAg'fj. Bnt this trace is 

(1 - u\Ag'j:Agj\l) + {uj\Ag'fAg'j\l - u) + 2c{u\AgjAg'j:\uj). (69) 

Since ||1 — a;|| = ^/c, the total is bonnded by {2y/c + 2c^)\\Agf\\‘^. 

The npshot is that the ratio of s{gb + Agj) — s{gb) and T{gb + Agj) — T{gb) is the same as 
that compnted for fc-stars (np to an overall factor of nke^~^), pins an o(l) correction. Bnt 
that ratio was bonnded by a constant /3o < (3- Restricting attention to valnes of r for which 
the correction is smaller than {f3 — /5o)/2, we still obtain the result that having a non-zero 
Agf is a less efficient way of generating additional r than simply changing c. Thus the 
optimizing graphon is exactly bipodal. 

Once bipodality is established, uniquenesss follows exactly as in the proof of Theorem 
4.1. The difference between At and nfee^^^Ar^ is of order and so does not affect the 
linearization of the optimality equations at c = 0. 

6 Linear combinations of /c-stars 

We proved Theorem 1.1 by hrst showing that fc-star models have the desired behavior, 
and then showing that, for an arbitrary fc-starlike graph H, At is well-approximated by a 
multiple of Ata,, so the model with densities of edges and H behaves essentially the same as 
a model with densities of edges and fc-stars. 

To prove Theorem 1.2, we consider in this section a family of models in which we can 
prove bipodality and uniqueness of entropy maximizers directly, as we did for fc-stars. In the 
next section, we will show how to approximate a model with an arbitrary H with a model 
in this family. 

Let h{x) = J2k>i t)e a polynomial with non-negative coefficients and degree > 2. 
Let r = ^ afcTfc, and consider graphs with hxed edge density e and hxed r. In [6] it was 
proved that the entropy-maximizing graphons in such models are always multipodal. 

Most of the analysis of fc-star models carries over to positive linear combinations, and so 
will only be sketched briefly. We will provide complete details where the arguments differ. 
In analogy to our earlier development, let 'ip{e, e) = N/D, where 

N(e,e) = 2|S„(€)-S„(€)-(e-e)S;(e)], 

Zl(e,e) = h[e) — h{e) — {e — e)h'{e). (70) 

Since h"{x) is positive for a; > 0, 77 is only zero when e = e, and we £11 in that removable 
singularity in tfj by defining = 2S'o(e)/h"(e). 

Theorem 6.1. For all but finitely many values of e, there is a tq > h{e) such that, for 
T G (h(e),ro), the entropy-optimizing graphon is bipodal and unique, with data varying 
analytically with e and r. As t approaches h{e) from above, P 22 —t e, P 12 approaches a 
point e where '^'(e, e) = 0, pn satisfies S'g(pii) = 250 (^ 12 ) — Sq{p 22 ) and c —)■ 0 as 0(Ar). 
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Proof. For a multipodal graphon, T{g) = 'Yfcih{di). After eliminating 7 , the optimality 
equations become 


5'(p,,) = a + + h\d,))/2, (71) 

M 

2'^Cj{So{pij)-So{pMj)) = 2a{di-dM) + ld[h{di)-h{dM) + ^Cjh'{dj){pij-pMj)\- (72) 
j=i i=i 

As before, we distinguish between Type I clusters that are small and Type II clusters that 
have di ~ e. Summing the optimality equations over j of Type II, and approximating dj by 
e, we obtain the equations 


S'^{df) = a + P{h'{df) + h\e))/2, (73) 

S'o(di) - S'o(e) = a{di-e) + ld[h{di) - h{e) + h\e){di-e)], (74) 

that are accurate to within o(l). We use the hrst equation, with i = M (a type II cluster), 
to solve for a, and plug it into the equations for i < M to get 

2{S',{df) - S',{e)) = f^{h\d,)-h\e)), (75) 

2[So{di) - So{e) - SQ{e){di - e)] = /3[h{di) - h{e) - h'{e){di - e)], (76) 


again to within o(l). As before in the proof of Theorem 4.1, this implies that either di ^ e 
or that di) is maximized with respect to di. 

Unlike in the /c-star case, it is not true that 'ip'{e,e) has a unique solution for each e. 
However, it remains true that e) has a unique global maximizer (w.r.t. e) for all but 
hnitely many values of e. Since the equations dehning multiple maxima are analytic, they 
must be satished either for all e or for only hnitely many e. But it is straightforward to 
check that there is only one maximizer when e is sufficiently small, since then h{e) and h\e) 
are dominated by the lowest order term in the polynomial. 

Thus, for all but hnitely many values of e, the values of di must all either approximate e or 
the unique value of e that maximizes e). This allows for a re-segregation of the clusters 
into Type I (with di close to e) and Type II (with di close to e) and yields a graphon that 
is approximately bipodal. Step 2 of the proof of Theorem 4.1, proving that the optimizing 
graphon is exactly bipodal with data of the desired form, then precedes exactly as before. 

What remains is showing that the optimizing graphon is unique by linearizing the exact 
optimality equations for bipodal graphons near c = 0. These equations are: 


S'o(pii) = a + /3h\di), 

SoiPu) = a + ^{h\d,) + h\d2))/2, 

Sq{p22) = a + (3h'{d2), 

dS _ de de 

dc ^ dc 

e = eo, 

r = To. (77) 


Using the second and third equations to eliminate a and j3 gives: 
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a 


2h'{d2)S',{pu) - S',{p22){h'{d2) + h'jdi)) 
h'{d2) - h'{di) 

2{S>,{p22) - S’M) ^ 

h'{d2) — h'{di) 


We also have a 


So{P22) 

da 

dpi2 


/ 3 h'{d 2 ) and S'o(pii) 

= -Pch''{d2) - h'{d2) 


= 2 Sq{pi 2 ) - Sq{p22)- Note that 


d /3 

dpi2 


-h'{p22) 


d /3 

dpi2 


as c \ 0 . 

We define / as before, with /s = e and /4 = r, and compnte 


dfs = (c^, 2 c(l - c), (1 - cf, 2 cpii + 2(1 - 2c)pi2 - 2(1 - 0)^22) 

^ ( 0 , 0 , 1,2(pi2-P22)), 

dU = (cV(di),c(l-c)(h'(di) + h'(d2)),(l-c)V(d2), 

h{di) - h{d2) + ch'{di){pii - P12) + h'{d2){pi2 - P22)) 
^ ( 0 , 0 , h'{p22), h{pi2) - KP22) + h'{p22){pi2 - P22))- 


( 78 ) 


( 79 ) 


( 80 ) 


The lower right block of df then gives a contribntion of h^pu) — ^(^22) + h'{p22){pi2 —P22) — 
2 ^h'{P22){pi2 - P22) = h{pi2) - h{p22) - h'{P22){pi2-P22) = ^(P22,Pl2)- 

As before, df2/dpii = 0 when c = 0 , so det{df) = So{pii){h{pu) - KP22) - h'{p22){pi2 - 
P22))df2/dpii. Now 

<9/2 _ d‘^S _ ^ d^e d^r _ da de _ d /3 dr 

dpi2 dcdpi2 dcdpi2 dcdpi2 dpi2 dc dpi2 dc 

Since a and /3 are independent of c, the first three terms are 


d / dS de _ p \ 
dc \dp12 dpi2 dpu) 


dc 


( 0 ) = 0 , 


( 82 ) 


by the second variational eqnation. This leaves 


df2/dpi2 = {h'{P22){2pi2 - 2p22) “ (^(^ 12 ) “ h{p22) + k'{p22){Pl2 “ P22)))d (3 / dpi2. ( 83 ) 


Combining with our earlier results, we have: 

det(d/) = -^o(pii)T>(p 22 ,Pi 2 )^tw^- (84) 

api2 

The expression D{p 22 ,Pi 2 ) = h{Pi 2 )-h{p 22 )-h'{p 22 ){Pi 2 -P 22 ) has a double root at pi 2 = P 22 
and is nonzero elsewhere, thanks to the monotonicity of h'. 

As a last step, we consider when d(3/dpi2 can be zero. Since (3 = N'/D', we are interested 
in when {N'/D’)' = 0. But that is equivalent to having N"/D" = N'/D'. Since we already 
have N/D = N'/D', this means that 'ip" = {N/D)" = 0. Since we are looking at the value 
of e that maximizes pj, having t/j' = p)" = d would imply i/j'" = 0 (or else e would only be 
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a point of inflection, and not a local maximum). But if {N/D)' = (N/D)" = (N/D)"' = 0, 
then N/D = N'/D' = N" jD" = N'" jD'". Note that N'\ D" and D'" are functions of 
e only, and are rational functions: 


N” 

N'" 

D" 

D'" 


= h - (i^- 

h"{e), 

h"'(e). 


(85) 


Setting D"N"' = D"'N" gives a polynomial equation for e, which has only hnitely many 
roots. Since the equation -0' = 0 is symmetric is e and e, e determines e, so there are only 
hnitely many values of e for which dl 3 /dpi 2 is zero. 

In summary, we exclude the hnitely many values of e for which ip achieves its maximum 
more than once, and the hnitely many values of e for which 8(3/dpu = 0. For all other 
values of e, the optimizing graphon is bipodal of the prescribed form and unique. 

□ 


7 Proof of Theorem 1.2 

The proof has three steps. 

Step 1. Showing that, for hxed e. At can be approximated by the change in a positive linear 
combination of r^’s, as studied in the last section. 

Step 2. Dehning a set Bh C (0,1) of “bad values”, determined by analytic equations, such 
that for all e ^ Bh and for r close enough to e^, the optimizing graphon is unique 
and bipodal and of the desired form. 

Step 3. Showing that Bh is hnite. 

Step 1. This is a repetition of the proof of Lemma 5.1. In the expansion of At, we get a 
contribution from diagrams where all the edges associated with Ag are connected 

to a vertex of degree k, where Uk is the number of vertices of H of degree k. Summing over 
k, and bounding the remaining terms by 0 (||A 5 f||^), as before, we have 

At = ^ Uke^-^ATk + 0(Ar3/2). 

k 

Step 2. For hxed e, we consider a model whose density is As long as e) 

for this model achieves its maximum at a unique value of e, and as long as 8(3 / 8 pi 2 7 ^ 0 when 
P 12 equals this value of e, the proofs of Theorems 1.1 and 6.1 carry over almost verbatim. 

That is, the model problem has a unique bipodal maximizer by the reasoning of Theorem 
6.1. The entropy maximizer for the actual problem involving H must approximate the 
entropy maximizer for the model problem, and in particular must be approximately bipodal. 
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and so can be written as + A^fj, where averages to zero on each qnadrant. The same 
argnments as in the proof of Theorem 1.1 show that Agj is pointwise small. By a power 
series expansion, {s{gh + ^gf) — s{gh))/{ri^gh + ^gj) — T{gb)) < /3, so for small c we can 
increase the entropy by setting Agf to zero and varying the bipodal data to achieve the 
correct valne of r. 


Step 3. For any hxed e, the model problem has only a hnite nnmber of bad valnes of e, 
bnt this is not enongh to prove that Bh is hnite. Rather 

Bh = {e|e is one of the bad points for the model with Uk = 77,^6^“^}, (87) 

where a valne of e is bad for a model if either 'ip has mnltiple maxima or if dl 3 /dpi 2 = 0. Since 
the bad points for any linear combination of fc-stars depends analytically on the coefficients 
of that linear combination, and since these coefficients are powers of e, the set Bh is cut out 
by analytic equations in e. 

As such, Bh is either the entire interval (0,1), or a hnite set, or a countable set with 
limit points only at 0 and/or 1. We will show that neither 0 nor 1 is a limit point oi Bh, 
implying that Bh is hnite. 

Let kmax be the largest degree of any vertex in H, and consider the model problem with 
= 'l 2 k ='2 where = nke^~^. We begin with some constraints on the values of e 

for which t/' = 0. 

Lemma 7.1. Suppose that -(/'(e,?) = 0. If e = e, or if d(3/dpi2 = 0 when P 22 = e and 
P 12 = e, then (1/2) < e < {kmax - i)/kmax- 

Proof of lemma. In both cases we are looking for solutions to N"D"' = N'"D". Since N" = 
250 ( 6 ), N'" = 25o'(e), D" = h"{e) and D'" = h"'{e), this equation does not involve e (except 
insofar as the coefficients of h depend on e). We have 

25"(7) 

1 1 

1 — 6 6 
2e- 1 

1 — 6 


The right hand side of the last line is a weighted average of fc —2 with weights k{k — l)ake^~‘^, 
and so is at least zero and at most kmax — 2. Thus (1 — e)“^ is between 2 and kmax and e is 
between 1/2 and {kmax - i)/kmax- □ 

Lemma 7.2. Iffj'{e,e) = 0, and if e is sufficiently close to 1, then e is uniquely defined and 
approaches 0 as e ^ 1. Likewise, if e is sufficiently close to 0, then e is uniquely defined and 
approaches 1 as 6 —)• 0. 


h"(e) ’ 
“ 6) 


h”' 


h"{e) ’ 
e) 


eh'" 


h"{e) ’ 

X) k{k - l){k - 2 )ake’^ ^ 


X) k{k - l)ake 


k -2 


( 88 ) 


24 










Proof. When e < 1/2, or when e > {kmax — ^)/kmax: we cannot have e = e, so the equation 
= 0 is equivalent to ND' = DN' and e 7 ^ e. Writing DN' — ND' = 0 explicitly, and 
doing some simple algebra, yields the equation 


5'Q(e)[h(e)-/i(e)-(e-e)/i'(e)]-S"(e)[[h(e)-/i(e)-(e-e)/i'(e)] + (S'o(e)-S'o(e))(h'(e)-/i'(e)) = 0. 

(89) 

If e approaches 0 or 1 and e does not, then the hrst term diverges, while the other terms do 
not, insofar as S'q has singularities at 0 and 1 but 5 * 0 , h and h' do not. Thus e must go to 0 
or 1 as e goes to 0 or 1 . 

We next rule out the possibility that both e and e approach 1 . Suppose that e is close 
to 1. We expand both N and D in powers of (e — e): 


N 


D 



6 '“-' ) 



1 )’ 


(90) 


where and denote mth derivatives. Note that the coefficients of the numerator 
grow rapidly with m, while the growth of the coefficients of the denominator depend only on 
the degree of h. For e > e > {k^ax — ^)lkmaxi = N/D is a decreasing function of e (that 
is, negative and increasing in magnitude), so we cannot have ip' = 0. Since the equation 
= 0 is symmetric in e and e (apart from the dependence of the coefficients of h on e), we 
also cannot have e > e > {kmax — ^)/kmax- 

When e is close to 1, we must thus have e close to 0. But then N ^ 2S'o(e), D ^ 
h'{e) — h{e), D' ^ —h'{E), and the equation 


2^'(e) = N' + 2S'o{e) = 25*'(e) + ND'/D 


(91) 


determines >S'o(e), and therefore e, uniquely as a function of e. 

Next we consider e —)■ 0. If if is 2-starhke, then is a multiple of ip 2 , and the result is 
already known. Otherwise, it is convenient to dehne a new polynomial h{z) = so 

that h{x) = e^h{x/e). Then 


D = h{e) — h{e) — h'{e){e — e) 

= e^[h{r)-h{l)-h'{l){r-l)] (92) 

where r := e/e. Likewise, 

N = — [eln(e) — eln(e) + (1 — e) ln(l — e) — (1 — e)(l — e) — (e — e)(ln(e) — ln(l — e))] 

(93) 

Since e and e are small, we can approximate ln(l —e) and ln(l —e) as —e and —e, respectively, 
giving 

N —e[rlnr — r + 1] + e^(r — r^) (94) 
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Note that the ratio V’ = N/D is negative. Since h is a polynomial of degree at least 3, D 
grows faster than iV as r —)■ cx), so we can always increase 'ip by taking larger and larger 
values of r = e/e. This argument only breaks down when the approximation ln(l — E) ^ —e 
breaks down, i.e. at values of e that are no longer close to 0. Thus we cannot have e and e 
both close to zero. 

Finally, if e is close to 0 and e is close to 1, then h{e) and h'{e) are close to zero, while 
h{e) is close to a multiple of , since the coefficient of is 0(l/e) larger than any 

other coefficient. Thus pj behaves like ipkmaxi has a unique maximizer. □ 

We have shown that when e is close to 0 or 1, -0 has a unique maximizer. Furthermore, 
e is not between 1/2 and {kmax — ^)lkmax-i so dj3/dpi2 ^ 0. So e ^ completing Step 3 
and the proof of Theorem 1.2. 

8 Conclusions 

We have shown that just above the ER curve, entropy maximizing graphons, constrained by 
the densities of edges and any one other subgraph H, exhibit the same qualitative behavior 
for all H and for (almost) all values of e. The optimizing graphon is unique and bipodal. 

These results were proven by perturbation theory, using the fact that the optimizing 
graphon has to be L^-close to a constant (Erdos-Renyi) graphon. Surprisingly, the optimizing 
graphon is not pointwise close to constant. Rather, it is bipodal, with a small cluster of 
size 0(Ar). As Ar approaches 0, the size of the small cluster shrinks, but the values of 
the graphon on each quadrant do not approach one another. Rather, P 22 approaches e, 
P12 approaches the value of e that maximizes a specihc function 'ip{e,e), and pu satishes 
>50(^11) — 2S'q(pi2) -I- Sq{p22) = 0. 

Finally, the asymptotic behavior of these graphons as r —)■ depends only on the degree 
sequence of H. In particular, the cases where FT is a triangle and when iF is a 2-star are 
asymptotically the same. This is illustrated in Figure 2. Since Artriangie ~ 3eAr2, the 
optimizing graphon for the 2-star model with e = 0.4 and At 2 = 0.002 should resemble 
the optimizing graphon for the triangle model with e = 0.4 and Artriangie = 0.0024. These 
optimizing graphons are obtained using the algorithms we developed in [12] without assuming 
bipodality. Numerical estimates indicate that the optimizing graphons are not exactly the 
same, thanks to 0 (Ar 2 ^^) corrections to Artriangie, but are still qualitatively similar. 
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Figure 2: Numerical estimates of the optimizing graphon for the 2-star model with e = 0.4 
and T 2 = 0.1620 (left) and the optimizing graphon for the triangle model with e = 0.4 and 
'Ttriangie = 0.0664 (right). (Although theoretically we have not tried to prove that At 2 = 0.002 
is small enough to £t into the interval provided by Theorem 1.1, numerically it appears to 
be the case.) 
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